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These notes discuss, in a style intended for physicists, how to average data and fit it to 
some functional form. I try to make clear what is being calculated, what assumptions are 
being made, and to give a derivation of results rather than just quote them. The aim is 
put a lot useful pedagogical material together in a convenient place. This manuscript is a 
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I. INTRODUCTION 

These notes describe how to average and fit numerical data that you have obtained, presumably 
by some simulation. 

Typically you will generate a set of values Xi, yi, ■ ■ ■ , i = 1, ■ ■ ■ N , where N is the number of 
measurements. The first thing you will want to do is to estimate various average values, and 
determine error bars on those estimates. As we shall see, this is straightforward if one wants 
to compute a single average, e.g. {x), but not quite so easy for more complicated averages such 
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as fluctuations in a quantity, (x^) — (x)^, or combinations of measured values such as 
Averaging of data will be discussed in Sec. II. 

Having obtained several good data points with error bars, you might want to fit this data to 
some model. Techniques for fitting data will be described in the second part of these notes in 
Sec. Ill 

I find that the books on these topics usually fall into one of two camps. At one extreme, the 
books for physicists don't discuss all that is needed and rarely prove the results that they quote. At 
the other extreme, the books for mathematicians presumably prove everything but are written in 
a style of lemmas, proofs, e's and 5's, and unfamiliar notation, which is intimidating to physicists. 
One exception, which finds a good middle ground, is Numerical Recipes [1] and the discussion of 
fitting given here is certainly influenced by Chap. 15 of that book. In these notes I aim to be 
fairly complete and also to derive the results I use, while the style is that of a physicist writing 
for physicists. I also include scripts in python, perl, and gnuplot to perform certain tasks in data 
analysis and fitting. For these reasons, these notes are perhaps rather lengthy. Nonetheless, I hope, 
that they will provide a useful reference. 



II. AVERAGES AND ERROR BARS 
A. Basic Analysis 

Suppose we have a set of data from a simulation, Xj, = 1, • • • , A^), which we shall refer to as 
a sample of data. This data will have some random noise so the xi are not all equal. Rather they 
are governed by a distribution P{x), which we don't know. 

The distribution is normalized, 

/oo 
P{x)dx = l, (1) 
-oo 

and is usefully characterized by its moments, where the n-th moment is defined by 

/oo 
x" P{x) dx . (2) 
-oo 

We will denote the average over the exact distribution by angular brackets. Of particular interest 
are the first and second moments from which one forms the mean ^ and variance o"^, by 

H = (x) (3a) 

a^^{{x-{x)f) = {x^)-{x)\ (3b) 



4 



The term "standard deviation" is used for c, the square root of the variance. 

In this section we will estimate the mean (x), and the uncertainty in our estimate, from the N 
data points Xj. The determination of more complicated averages and resulting error bars will be 
discussed in Sec. II B 

In order to obtain error bars we need to assume that the data are uncorrelated with each other. 
This is a crucial assumption, without which it is very difficult to proceed. However, it is not always 
clear if the data points are truly independent of each other; some correlations may be present but 
not immediately obvious. Here, we take the usual approach of assuming that even if there are 
some correlations, they are sufficiently weak so as not to significantly perturb the results of the 
analysis. In Monte Carlo simulations, measurements which differ by a sufficiently large number of 
Monte Carlo sweeps will be uncorrelated. More precisely the difference in sweep numbers should 
be greater than a "relaxation time" . This is exploited in the "binning" method in which the data 
used in the analysis is not the individual measurements, but rather an average over measurements 
during a range of Monte Carlo sweeps, called a "bin". If the bin size is greater than the relaxation 
time, results from adjacent bins will be (almost) uncorrelated. A pedagogical treatment of binning 
has been given by Ambegaokar and Troyer [2]. Alternatively, one can do independent Monte Carlo 
runs, requilibrating each time, and use, as individual data in the analysis, the average from each 
run. 

The information from the data is usefully encoded in two parameters, the sample mean x and 
the sample standard deviation s which are defined by^ 



In statistics, notation is often confusing but crucial to understand. Here, an average indicated 
by an over-bar, is an average over the sample of N data points. This is to be distinguished from 
an exact average over the distribution (•••), as in Eqs. (3a) and (3b). The latter is, however, just 
a theoretical construct since we don't know the distribution P{x), only the set of data points Xi 



^ The factor of A'^ — 1 rather than A*' in the expression for the sample variance in Eq. (4b) needs a couple of comments. 
Firstly, the final answer for the error bar on the mean, Eq. (16) below, will be independent of how this intermediate 
quantity is defined. Secondly, the A'^ terms in Eq. (4b) are not all independent since x, which is itself given by the 
Xi, is subtracted. Rather, as will be discussed more in the section on fitting, Sec. Ill, there are really only A'" — 1 
independent variables (called the "number of degrees of freedom" in the fitting context) and so dividing by TV — 1 
rather than A'^ has a rational basis. However, this is not essential and many authors divide by A' in their definition 
of the sample variance. 




(4a) 




(4b) 
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which have been sampled from it. 

Next we derive two simple results which will be useful later: 

1. The mean of the sum of N independent variables with the same distribution is N times the 
mean of a single variable, and 

2. The variance of the sum of independent variables with the same distribution is times 
the variance of a single variable. 



TV 

" ii 



The result for the mean is obvious since, defining X = X]i=i ^ 

N 



(X) = Y,{x^) = N{x,) \ = Nf,.\ (5) 



i=l 

The result for the standard deviation needs a little more work: 



4 ^ {X') - {Xf (6a) 

N 
N 

1=1 

= N{{x')-{xf) (6d) 

(6e) 



To get from Eq. (6b) to Eq. (6c) we note that, for i ^ j, (xiXj) = {xi){xj) since Xi and Xj are 
assumed to be statistically independent. (This is where the statistical independence of the data is 
needed.) 

If the means and standard deviations are not all the same, then the above results generalize to 

AT 



{X)=Y,^^^, (7a) 

i=l 

{<r'x) = j:^f- (7b) 



i=l 

Now we describe an important thought experiment. Let's suppose that we could repeat the set 
of N measurements very many many times, each time obtaining a value of the sample average x. 
From these results we could construct a distribution, P{x), for the sample average as shown in 
Fig. 1. 
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p(x) A 




1^ 



FIG. 1: The distribution of results for the sample mean x obtained by repeating the measurements of the 
N data points Xi many times. The average of this distribution is /i, the exact average value of x. The 
mean, x, obtained from one sample of data typically differs from fi by an amount of order cr^-, the standard 
deviation of the distribution Pix). 



If we do enough repetitions we are effectively averaging over the exact distribution. Hence the 
average of the sample mean, x, over very many repetitions of the data, is given by 

1 ^ 



(8) 



i.e. it is the exact average over the distribution of x, as one would intuitively expect, see Fig. 1. 
Eq. (8) also follows from Eq. (5) by noting that x = X/N. 



In fact, though, we have only the one set of data, so we can not determine /i exactly. However, 
Eq. (8) shows that 



the best estimate of fi is x. 



(9) 



i.e. the sample mean, since averaging the sample mean over many repetitions of the N data points 
gives the true mean of the distribution, An estimate like this, which gives the exact result if 



averaged over many repetitions of the experiment, is said to be unbiased. 



We would also like an estimate of the uncertainty, or "error bar" , in our estimate of x for the 
exact average /i. We take a-x, the standard deviation in x (obtained if one did many repetitions of 
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the measurements), to be the uncertainty, or error bar, in x. The reason is that ax is the width 
of the distribution P{x), shown in Fig. 1, so a single estimate x typically differs from the exact 
result fj, by an amount of this order. The variance cj|- is given by 

^2 



N 



(10) 



which follows from Eq. (6e) since x = X/N. 

The problem with Eq. (10) is that we don't know o"^ since it is a function of the exact 
distribution P{x). We do, however, know the sample variance s^, see Eq. (4b), and the average of 
this over many repetitions of the N data points, is equal to cr^ since 



N N 



■t=i ^ ^ 1=1 j=i 



a 



(11a) 

(lib) 

(11c) 
(lid) 



To get from Eq. (11a) to Eq. (lib), we have separated the terms with i = j in the last term of 
Eq. (11a) from those with i ^ j, and used the fact that each of the Xi is chosen from the same 
distribution and is statistically independent of the others. It follows from Eq. (lid) that 



the best estimate of is , 



(12) 



since averaging s over many repetitions of N data points gives a . The estimate for a in Eq. (12) 
is therefore unbiased. 

Combining Eqs. (10) and (12) gives 



the best estimate of cjI- is — , 

^ AT 



(13) 



since this estimate is also unbiased. We have now obtained, using only information from the data, 
that the mean is given by 



IJ, = X ± ax, 



(14) 



where 



a^n 



N 



(15) 
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which we can write exphcitly in terms of the data points as 




(16) 



Remember that x and s are the mean and standard deviation of the (one set) of data that is 
available to us, see Eqs. (4a) and (4b). 

As an example, suppose N = 5 and the data points are 



Xi = 10,11,12,13,14, 



(17) 



(not very random looking data it must be admitted!). Then, from Eq. (4a) we have x = 12, and 
from Eq. (4b) 



,2=1 [(_2)2 + (_1)2 + 0^ + 12 + 22] 



(18) 



Hence, from Eq. (15), 



1/5 1 



^/5 V 2 ^2' 



so 



H = X ziz (Jx = 12 ziz —j=. 

v2 



(19) 



(20) 



How does the error bar decrease with the number of statistically independent data points A^? 
Equation (lid) states that the expectation value of is equal to and hence, from Eq. (15), we 
see that 



the error bar in the mean goes down like l/yN. 



Hence, to reduce the error bar by a factor of 10 one needs 100 times as much data. This is 
discouraging, but is a fact of life when dealing with random noise. 

For Eq. (15) to be really useful we need to know the probability that the true answer /i lies more 
than a-x away from our estimate x. Fortunately, for large A^, the central limit theorem, derived in 
Appendix A, tells us (for distributions where the first two moments are finite) that the distribution 
of X is a Gaussian. For this distribution we know that the probability of finding a result more than 
one standard deviation away from the mean is 32%, more than two standard deviations is 4.5% 
and more than three standard deviations is 0.3%. Hence we expect that most of the time x will 
be within Gx of the correct result and only occasionally will be more than two times ax from 
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it. Even if is not very large, so there are some deviations from the Gaussian form, the above 
numbers are often a reasonable guide. 

However, as emphasized in appendix A, distributions which occur in nature typically have much 
more weight in the tails than a Gaussian. As a result, the weight in the tails of the distribution of 
the sum can also be much larger than for a Gaussian even for quite large values of A^, see Fig. 6. It 
follows that the probability of an "outlier" can be much higher than that predicted for a Gaussian 
distribution, as anyone who has invested in the stock market knows well! 

B. Advanced Analysis 

In Sec. II A we learned how to estimate a simple average, such as fix = i-c), plus the error bar in 
that quantity, from a set of data Xj. Trivially this method also applies to a linear combination of 
different averages, fXx^f^y, ■ ■ ■ etc. However, we often need more complicated, non-linear functions 
of averages. One example is the fluctuations in a quantity, i.e. (x^) — (x)^. Another example is a 
dimensionless combination of moments, which gives information about the shape of a distribution 
independent of its overall scale. Such quantities are very popular in finite-size scaling (FSS) analyses 
since the FSS form is simpler than for quantities with dimension. An popular example, first 
proposed by Binder, is which is known as the "kurtosis" (frequently a factor of 3 is 

subtracted to make it zero for a Gaussian). 

Hence, in this section we consider how to determine non-linear functions of averages of one or 
more variables, /(%, /iz, • • • ), where 

l^y = iy) , (21) 
etc. For example, the two quantities mentioned in the previous paragraph correspond to 

f{l^y,l^z) = fJ'y - fJ'l, (22) 

with y = and z = x and 

/(^„^,) = ^, (23) 

with y = x^ and z = x^. 

The natural estimate of /(/-fy, j-Lz) from the sample data is clearly f(lj,z). However, it will take 
some more thought to estimate the error bar in this quantity. The traditional way of doing this 
is called "error propagation" , described in Sec. II B 1 below. However, it is now more common to 
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use either "jackknife" or "bootstrap" procedures, described in Sees. IIB2 and JIB 3. At the price 
of some additional computation, which is no difficulty when done on a modern computer (though 
it would have been tedious in the old days when statistics calculations were done by hand), these 
methods automate the calculation of the error bar. 

Furthermore, the estimate of /(/iy,/^^) turns out to have some bias if / is a non-linear function. 
Usually this is small effect because it is order 1/A^, see for example Eq. (30) below, whereas the 
statistical error is of order 1/^/N. Since N is usually large, the bias is generally much less than 
the statistical error and so can generally be neglected. In any case, the jackknife and bootstrap 
methods also enable one to eliminate the leading (~ 1/-/V) contribution to the bias in a automatic 
fashion. 

1. Traditional method 

First we will discuss the traditional method, known as error propagation, to compute the error 
bar and bias. We expand f{y,z) about f{fiy,fiz) up to second order in the deviations: 

f{y,z) = f{f,y,f,z) + {d,J)6y + id,J)6^+^{dl^J)5lHdl,J^ , (24) 

where 

etc. 

The terms of first order in the 5' s in Eq. (24) give the leading contribution to the error, but 
would average to zero if the procedure were to be repeated many times. However, the terms of 
second order do not average to zero and so give the leading contribution to the bias. We now 
estimate that bias. 

Averaging Eq. (24) over many repetitions, and noting that 

(4) = (y') - (y)' ^ 4 (<^l) = (^') - (^)' ^ 4, (<5^<^r) = (y^)-(y)(^) = 4^, (26) 

we get 

{f{y,z)) - fifiy, M,) = 1 (d^^J) 4 + (dl^J) 4^ + I {dl^J) al . (27) 

As shown in Eq. (13) our estimate of is times the sample variance (which we now call Syy), 
and similarly for cj|. In the same way, our estimate of a~ is times the sample covariance of 



11 



y and z, defined by 



N 



1 ^ _ _ 

1=1 



(28) 



Hence, from Eq. (27), we have 



2 i'^/J.yfiyf) ^yy ~^ (^MhMz-^) ^yz ~^ 2 ^^/^2Mz-^) ^zz 



(29) 



where the leading contribution to the bias is given by the term. Note that the bias term is 
"self-averaging", i.e. the fluctuations in it are small relative to the average (by a factor of 1/y/N) 
when averaging over many repetitions of the data. It follows from Eq. (29) that if one wants to 
eliminate the leading contribution to the bias one should 



estimate /(/iy,/!^) from/(y,z)-^ 



r, i^fiyfiyf) ^yy + i^fiyfi^f) ^yz + 9 ((^Hz/izf) ^ 



(30) 



As claimed earlier, the bias correction is of order Note that it vanishes if / is a linear function, 
as shown in Sec. II A. The generalization to functions of more than two averages, f{ny, ^z, Mw? ' ' ' )> 
is obvious. 

Next we discuss the leading error in using f{y,z) as an estimate for f{fj.y, ^jlz)- This comes from 
the terms linear in the (5's in Eq. (24). Just including these terms we have 

{f{y,z)) = f(,f^y,f^z), (31a) 
( f{y,z) ) = f{f^y,f^z) + {d^yff (4) + 2{d,J) (d^J) {6y6^) + {d^jf . (31b) 

Hence 



As above, we use Syy/N as an estimate of {6'^) and similarly for the other terms. Hence 



the best estimate of a} is - (d^jf sly + 2{d^J) {8^ J) + {O^^jf 4 



(32) 



(33) 



This estimate is unbiased to leading order in N. Note that we need to keep track not only of 



fluctuations in y and z, characterized by their variances sly and s^^, but also cross correlations 
between y and z, characterized by their covariance s^^. 
Hence, still to leading order in A^, we get 



fifJ-yif^z) = f{y,z) ±(7/ 



(34) 
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where we estimate the error bar cjj from Eq. (33) which shows that it is of order Again, 
the generahzation to functions of more than two averages is obvious. 

Note that in the simple case studied in Sec. II A where there is only one set of variables Xi and 
/ = /i^, Eq. (29) tells us that there is no bias, which is correct, and Eq. (33) gives an expression 
for the error bar which agrees with Eq. (15). 

In Eqs. (30) and (33) we need to keep track how errors in the individual quantities like y 
propagate to the estimate of the function /. This requires inputting by hand the various partial 
derivatives into the analysis program, and keeping track of all the variances and covariances. In the 
next two sections we see how resampling the data automatically takes account of error propagation 
without needing to input the partial derivatives and keep track of variances and covariances. These 
approaches, known as jackknife and bootstrap, provide a fully automatic method of determining 
error bars and bias. 



2. Jackknife 



We define the i-th. jackknife estimate, yf (z = 1, 2, • • • , N) to be the average over all data in the 
sample except the point i, i.e. 



N 



31 Y.yj- 



(35) 



We also define corresponding jackknife estimates of the function / (again for concreteness we will 
assume that / is a function of just 2 averages but the generalization will be obvious): 



// = /(y/,z/). 



(36) 



In other words, we use the jackknife values, rather than the sample means, y,z, as the 



arguments of /. For example a jackknife estimate of the Binder ratio {x ) / {x ) is 



fi 



(37) 



The overall jackknife estimate of f{fj,y,^z) is then the average over the N jackknife estimates //: 



N 



N 



i=l 



(38) 



It is straightforward to show that if / is a linear function of fiy and /x^ then /"^ = f{y, z), i.e. the 
jackknife and standard averages are identical. However, when / is not a linear function, so there 
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is bias, there is a difference, and we will now show the resampling carried out in the jackknife 
method can be used to determine bias and error bars in an automated way. 
We proceed as for the derivation of Eq. (29), which we now write as 

f{i^v,i^z) = {f{y,z)) - 4 - ^ + • • • ' (^9) 

where A is the term in rectangular brackets in Eq. (29), and we have added the next order correction. 
The jackknife data sets have — 1 points with the same distribution as the N points in the actual 
distribution, and so the bias in the jackknife average will be of the same form, with the same values 
of A and B, but with N replaced by — 1, i.e. 

/(z^.,/^.) = in-j^,- • • • • (40) 

We can therefore eliminate the leading contribution to the bias by forming an appropriate linear 
combination of f(lj,z) and /•-', namely 

fi^^y,^,) = N{f{y,z))-{N-l)(p) + o(^-^^ . (41) 

It follows that, to eliminate the leading bias without computing partial derivatives, one should 



estimate /(//^ , from Nf{y, z)-{N-l)fJ. (42) 



The bias is then of order l/N"^. However, as mentioned earlier, bias is usually not a big problem 
because, even without eliminating the leading contribution, the bias is of order 1/N whereas the 
statistical error is of order which is much bigger if N is large. In most cases, therefore, N 

is sufficiently large that one can use either the usual average f{y,'z), or the jackknife average f"^ 
in Eq. (38), to estimate f{fiy,fiz), since the difference between them will be much smaller than 
the statistical error. In other words, elimination of the leading bias using Eq. (42) is usually not 
necessary. 

Next we show that the jackknife method gives error bars, which agree with Eq. (33) but without 
the need to explicitly keep track of the partial derivatives and the variances and covariances. 
We define the variance of the jackknife averages by 

2 



where 



a%^ifjf-(fA , (43) 



-I N 



N 

i=l 
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Using Eqs. (36) and (38), we expand f-^ away from the exact result fiHyjfJ^z)- Just including the 
leading contribution gives 

— 1 ^ 

fJ - /(/i,,^,) = - ^ [(d^J) {yi - f,y) + {d,J) [4 - /X,)] 



1 ^ 

r^I) E [(^M./) {^(y - My) - (2/i - l^y)} + (5;../) {iV( 



(d^yf) {y - Ms/) + {^^lJ) {z - f^z) ■ 



[Zi - IJ^z)}] 

(45) 



Similarly we find 



j=l 



f{^J'y, M^) + 2/(/iy, /i^) [{d^,J) (y - %) + (5^, /) (2; - ^2)] 



yy 



N{N - 1) 



N{N - 1) 



+ 2(5^J)(5^J) 



{y - ^j,y){z - ^Mz) + 



■-yz 



N{N -1) 

Hence, from Eqs. (43) and (45), the variance in the jackknife estimates is given by 

1 



(46) 



'^f' N{N - 1) 



yzj ) 



(47) 



which is just 1/{N — 1) times cjj, the estimate of the square of the error bar in f{y,z) given in 
Eq. (33). Hence 



the jackknife estimate for o"/ is y/ N — 1 a^j . 



(48) 



Note that this is directly obtained from the jackknife estimates without having to put in the partial 



derivatives by hand. Note too that the \/N — 1 factor is in the numerator whereas the factor of 
y/N in Eq. (15) is in the denominator. Intuitively the reason for this difference is that the jackknife 
estimates are very close since they would all be equal except that each one omits just one data 
point. 

If is very large, roundoff errors could become significant from having to subtract large, almost 
equal, numbers to get the error bar from the jackknife method. It is then advisable to group the 
N data points into A'group groups (or "bins") of data and take, as individual data points in the 
jackknife analysis, the average of the data in each group. The above results clearly go through 
with replaced by A^'group- 
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To summarize this subsection, to estimate f{fMy,fiz) one can use either f(jj,'z) or the jackknife 
average f-^ in Eq. (38). The error bar in this estimate, o"/, is related to the standard deviation in 
the jackknife estimates aj-j by Eq. (48). 

3. Bootstrap 

The bootstrap, hke the jackknife, is a resamphng of the N data points Whereas jackknife 
considers N new data sets, each of containing all the original data points minus one, bootstrap 
uses iVboot data sets each containing N points obtained by random (Monte Carlo) sampling of the 
original set of N points. During the Monte Carlo sampling, the probability that a data point is 
picked is irrespective of whether it has been picked before. (In the statistics literature this 
is called picking from a set "with replacement".) Hence a given data point Xi will, on average, 
appear once in each Monte Carlo-generated data set, but may appear not at all, or twice, and so 
on. The probability that Xi appears rii times is close to a Poisson distribution with mean unity. 
However, it is not exactly Poissonian because of the constraint in Eq. (49) below. It turns out that 
we shall need to include the deviation from the Poisson distribution even for large A^. We shall use 
the term "bootstrap" data sets to denote the Monte Carlo-generated data sets. 

More precisely, let us suppose that the number of times Xj appears in a bootstrap data set is 
nj. Since each bootstrap dataset contains exactly N data points, we have the constraint 

N 

Y^ni = N. (49) 

i=l 

Consider one of the N variables Xj. Each time we generate an element in a bootstrap dataset the 
probability that it is xi is 1/iV, which we will denote by p. From standard probability theory, the 
probability that Xi occurs Ui times is given by a binomial distribution 

The mean and standard deviation of a binomial distribution are given by 

[n^]^o=Np = l, (51) 

k'Imc - NIL = m^-p) = ^-^: (52) 

where [. . denotes an exact average over bootstrap samples (for a fixed original data set Xj). 
For — )• oo, the binomial distribution goes over to a Poisson distribution, for which the factor of 
in Eq. (52) does not appear. We assume that A'boot is sufficiently large that the bootstrap 
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average we carry out reproduces this result with sufficient accuracy. Later, we will discuss what 
values for A'^boot are sufficient in practice. Because of the constraint in Eq. (49), rij and rij (with 
i / j) are not independent and we find, by squaring Eq. (49) and using Eqs. (51) and (52), that 

K^jImc - KImcK'Imc = / j) ■ (53) 

First of all we just consider the simple average /i^ = (x), for which, of course, the standard 
methods in Sec. II A suffice, so bootstrap is not necessary. However, this will show how to get 
averages and error bars in a simple case, which we will then generalize to non-linear functions of 
averages. 

We denote the average of x for a given bootstrap data set by x^, where a runs from 1 to A'boot) 

i.e. 

N 



N 
1=1 

We then compute the bootstrap average of the mean of x and the bootstrap variance in the mean, 
by averaging over all the bootstrap data sets. We assume that A^boot is large enough for the 
bootstrap average to be exact, so we can use Eqs. (52) and (53). The result is 

A^boot N ^ N 



X^ = ^ 



■ DOOt 1 1 



a=l 1=1 1=1 



where 



-l^ = (-^)' - H ' = ]^ (l - ^) E - ]^ E ' (56) 
(-^)' - ]^ E [(-")1 • (57) 

J Vboot -, L J MC 

a=l 

We now average Eqs. (55) and (56) over many repetitions of the original data set Xj. Averaging 
Eq. (55) gives 

(^) = (x) = {x) = /i, . (58) 

This shows that the bootstrap average x^ is an unbiased estimate of the exact average fix- Aver- 
aging Eq. (56) gives 



/ 2 \ N -1 ^ N-1 2 ,„ , 



where we used Eq. (10) to get the last expression. Since ax is the uncertainty in the sample mean, 
we see that 



N 

the bootstrap estimate of cr^ is \l — OxB . 



(60) 
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Remember that g^b is the standard deviation of the bootstrap data sets. Usuahy is sufficiently 
large that the square root in Eq. (60) can be replaced by unity. 

As for the jackknife, these results can be generalized to finding the error bar in some possibly 
non-linear function, f{ij,y,fiz), rather than for fj^x- One computes the bootstrap estimates for 
f {fly, Hz), which are 



B B\ 



(61) 



In other words, we use the bootstrap values, y^,z^, rather than the sample means, y,z, as the 
arguments of /. The final bootstrap estimate for f{fiy,fj,z) is the average of these, i.e. 



a=l 



(62) 



Following the same methods in the jackknife section, one obtains the error bar, u/, in f{fiy,fiz 
The result is 



the bootstrap estimate for df is 



N 



N - 1 



a fB 



where 



a^B 



if 



B\2 



(63) 



(64) 



is the variance of the bootstrap estimates. Here 



in 



oot 



(65) 



Usually N is large enough that the factor of N/ (N — 1) is Eq. (63) can be replaced by unity. 
Equation (63) corresponds to the result Eq. (60) which we derived for the special case of / = fix- 
Again, following the same path as in the jackknife section, it is straightforward to show that the 
bias of the estimates in Eqs. (62) and (63) is of order 1/A^ and so vanishes for N — t- oo. However, 
if N is not too large it may be useful to eliminate the leading contribution to the bias in the mean, 
as we did for jackknife in Eq. (42). The result is that one should 



estimate /(//y,//^) from 2f{y,z) - f 



B 



(66) 



The bias in Eq. (66) is of order l/N"^, whereas f{y,z) and each have a bias of order 1/N. 
However, it is not normally necessary to eliminate the bias since, if A^ is large, the bias is much 
smaller than the statistical error. 
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I have not systematically studied the values of A^boot that are needed in practice to get accurate 
estimates for the error. It seems that A^boot in the range 100 to 500 is typically chosen, and this 
seems to be adequate irrespective of how large N is. 

To summarize this subsection, to estimate f{^y,^z) one can either use f{y,z), or the bootstrap 
average in Eq. (62), and the error bar in this estimate, o"/, is related to the standard deviation in 
the bootstrap estimates by Eq. (63). 

4-. Jackknife or Bootstrap? 

The jackknife approach involves less calculation than bootstrap, and is fine for estimating com- 
binations of moments of the measured quantities. Furthermore, identical results are obtained each 
time jackknife is run on the same set of data, which is not the case for bootstrap. However, the 
range of the jackknife estimates is very much smaller, by a factor of \/]V for large N, than the 
scatter of averages which would be obtained from individual data sets, see Eq. (48). By contrast, 
for bootstrap, C/s, which measures the deviation of the bootstrap estimates from the result for 
the single actual data set f{y,z), is equal to aj, the deviation of the average of a single data set 
from the exact result /(/Xy, Hz) (if we replace the factor of N/ (N — 1) by unity, see Eq. (63)). This 
is the main strength of the bootstrap approach; it samples the full range of the distribution of the 
sample distribution. Hence, if you want to generate data which covers the full range then should 
use bootstrap. This is useful in fitting, see for example. Sec. IIIF. However, if you just want to 
generate error bars on combinations of moments quickly and easily, then use jackknife. 

III. FITTING DATA TO A MODEL 

A good reference for the material in this section is Chapter 15 of Numerical Recipes [1]. 

Frequently we are given a set of data points {xi,yi),i = 1, 2, • • • , A^, with corresponding error 
bars, o"j, through which we would like to fit to a smooth function f{x). The function could be 
straight line (the simplest case), a higher order polynomial, or a more complicated function. The 
fitting function will depend on M "fitting parameters" , Oq and we would like the "best" fit obtained 
by adjusting these parameters. We emphasize that a fitting procedure should not only 

1. give the values of the fit parameters, but also 

2. provide error estimates on those parameters, and 
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3. provide a measure of how good the fit is. 

If the result of part 3 is that the fit is very poor, the results of parts 1 and 2 are probably 
meaningless. 

The definition of "best" is not unique. However, the most useful choice, and the one nearly 
always taken, is "least squares", in which one minimizes the sum of the squares of the difference 
between the observed y- value, yi, and the fitting function evaluated at Xj, weighted appropriately 
by the error bars since if some points have smaller error bars than others the fit should be closer 
to those points. The quantity to be minimized, called "chi-squared" ,^ and written mathematically 
as is therefore 



Often we assume that the distribution of the errors is Gaussian, since, according to the central 
limit theorem discussed in Appendix A, the sum of N independent random variables has a Gaussian 
distribution (under fairly general conditions) if N is large. However, distributions which occur in 
nature usually have more weight in the "tails" than a Gaussian, and as a result, even for moderately 
large values of N , the probability of an "outlier" might be much bigger than expected from a 
Gaussian, see Fig. 6. 

If the errors are distributed with a Gaussian distribution, and if f{x) has the exact values of 
the fit parameters, then \^ in Eq. (67) is a sum of squares of N random variables with a Gaussian 
distribution with mean zero and standard deviation unity. However, when we have minimized the 
value of with respect to the M fitting parameters the terms are not all independent. It turns 
out, see Appendix B, that, at least for a linear model (which we define below), the distribution of 

at the minimum is that of the sum of the squares of — M (not A^) Gaussian random variable 
with zero mean and standard deviation unity^ . We call N — M the "number of degrees of freedom" 
(A^dof)- The distribution is discussed in Appendix C. The formula for it is Eq. (C6). 

The simplest problems are where the fitting function is a linear function of the parameters. We 
shall call this a linear model. Examples are a straight line (M = 2), 



^ should be thought of as a single variable rather than the square of something called x- This notation is standard. 
Although this result is only valid if the fitting model is linear in the parameters, it is usually taken to be a 
reasonable approximation for non-linear models as well. 




(67) 



y = ao + aix 



(68) 
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and an m-th. order polynomial (M = m + 1), 

m 

y = oo + aix + a2X^ H h amx"^ = ^ aaX*" , (69) 

0=0 

where the parameters to be adjusted are the a^- (Note that we are not stating here that y has to 
be a linear function of x, only of the fit parameters a^.) 

An example where the fitting function depends non-linearly on the parameters is 

y = aox"-^ + a2 . (70) 

Linear models are fairly simply because, as we shall see, the parameters are determined by 
linear equations, which, in general, have a unique solution that can be found by straightforward 
methods. However, for fitting functions which are non-linear functions of the parameters, the 
resulting equations are non-linear which may have many solutions or none at all, and so are much 
less straightforward to solve. We shall discuss fitting to both linear and non-linear models in these 
notes. 

Sometimes a non-linear model can be transformed into a linear model by a change of variables. 
For example, if we want to fit to 

y = aox'^' , (71) 

which has a non-linear dependence on ai, taking logs gives 

In y = In ao + oi In a; , (72) 

which is a linear function of the parameters Cq = Inoo and oi. Fitting a straight line to a log-log 
plot is a very common procedure in science and engineering. However, it should be noted that 
transforming the data does not exactly take Gaussian errors into Gaussian errors, though the 
difference will be small if the errors are "sufficiently small" . For the above log transformation this 
means o'i/yi <^ 1, i.e. the relative error is much less than unity. 



A. Fitting to a straight line 

To see how least squares fitting works, consider the simplest case of a straight line fit, Eq. (68), 
for which we have to minimize 

X ao,ai =2^^ j , (73) 
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with respect to ao and ai. Differentiating witli respect to tliese parameters and setting tlie 
results to zero gives 



N 



1 



N 



1=1 

N 



N 



E-L , \ ^ -^i \ ^ Hi 



ao } . — + ai } ^— = } (74a) 

(74b) 



N 2 

xf 



i=l 
N 



E-^i , 'sr~^ Xj -^-^ •i'iyi 
1=1 * 



i=l 



i=l 



We write tliis as 



Uootto + Uoiai =vo, 
Uio ao + Uii ai = vi, 



(75a) 
(75b) 



wliere 




and (76) 

(77) 

Tlie matrix notation, wliile an overkill here, will be convenient later when we do a general polyno- 
mial fit. Note that Uiq = Uqi. (More generally, later on, U will be a symmetric matrix). Equations 
(75) are two linear equations in two unknowns. These can be solved by eliminating one variable, 
which immediately gives an equation for the second one. The solution can also be determined from 



f3=0 



a/3 



(78) 



(where we have temporarily generalized to a polynomial of order m). For the straight-line fit, the 
inverse of the 2x2 matrix \J is given, according to standard rules, by 



^~i _}_ I ~^oi 



A 



where 



A = UooUn - U, 



01 > 



(79) 



(80) 



and we have noted that U is symmetric so [/qi = f^io- The solution for oq and ai is therefore given 
by 



ao 



Uii vq - Uqi vi 
A 



ai 



-Urn + ^00 -^1 
A 



(81a) 
(81b) 
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We see that it is straightforward to determine the slope, ai, and the intercept, oq, of the fit from 
Eqs. (76), (77), (80) and (81) using the N data points {xi,yi), and their error bars cjj. 

B. Fitting to a polynomial 

Frequently we need to fit to a higher order polynomial than a straight line, in which case we 
minimize 

x\ao, ai, • • • , a„) = f; (^^i^E^o^^ ' (82) 
1=1 ^ * / 

with respect to the (m + 1) parameters Oq. Setting to zero the derivatives of with respect to 
the Oq gives 



13=0 



(83) 



where Ua/s and Va have been defined in Eqs. (76) and (77). Eq. (83) represents M = m + 1 linear 
equations, one for each value of a. Their solution is again given by Eq. (78), i.e. it is expressed in 
terms of the inverse matrix U~^. 

C. Error Bars 

In addition to the best fit values of the parameters we also need to determine the error bars in 
those values. Interestingly, this information is also contained in the matrix U~^. 

First of all, we explain the significance of error bars in fit parameters. We assume that the data 
is described by a model with a particular set of parameters a^^'^^ which, unfortunately, we don't 
know. If we were, somehow, to have many real data sets each one would give a different set of 
fit parameters a^^\i = 0, 1, 2, • • • , because of noise in the data, clustered about the true set a^^^^. 
Projecting on to a single fit parameter, ai say, there will be a distribution of values -P(ai) centered 
on a^'"*^ with standard deviation cji, see the top part of Fig. 2. Typically the value of oi obtained 
from our one actual data set, af'\ will lie within about ai of ai. Hence we define the error bar to 
be cJi. 



Unfortunately, we can't determine the error bar this way because we have only one actual data 
set, which we denote here by yf^^ to distinguish it from other data sets that we will introduce. Our 
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FIG. 2: The top figure shows the distribution of one of the fit parameters oi if one could obtain many real 
data sets. The distribution has standard deviation cti about the true value a*f^'^ and is Gaussian if the noise 
on the data is Gaussian. In fact, however, we have only one actual data set which has fit parameter af\ and 



this typically lies within about cti of a^''"°. Hence we 


define the error bar on the estimate of af^'^ to be cti . 


However, we 


cannot calculate cti directly from the distribution of ai 


because we have only one the one 



value, 



,(0) 



However, we can generate many simulated data sets from the one actual set and hence 



we can estimate the standard deviation, erf. of the distribution of the resulting fit parameter af , which 



is shown in the lower figure. This distribution is centered about the value from the actual data, a^i \ and 
has standard deviation, af . The important point is that if one assumes a linear model then one can show 
see text. Even if the model is non linear, one usually assumes that the difference in the stan- 



that 



(Ti — fTi, 



dard deviations is sufficiently small that one can still equate the true error bar with the standard deviation 
from the simulated data sets. We emphasize that 



the error bar quoted by fitting programs is actually erf 



and this is assumed to equal cti. Furthermore, as shown in Appendices E and F. if the noise on the data is 
Gaussian (and the model is linear) both the distributions in this figure are also Gaussian. 

actual data set gives one set of fit parameters, which we call d^^\ Suppose, however, we were to 
generate many simulated data sets from of the one which is available to us, by generating random 
values (possibly with a Gaussian distribution though this won't be necessary yet) centered at the 
Hi with standard deviation cjj. Fitting each simulated dataset would give different values for a, 
clustered now about a^^\ see the bottom part of Fig. (2). We now come to an important, but rarely 
discussed, point: 



We assume that the standard deviation of the fit parameters of these simulated data 
sets about a^^^ , which we will be able to calculate from the single set of data available 
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to us, is equal to the standard deviation of the fit parameters of real data sets a about 
^rue^ The latter is what we really want to know (since it is our estimate of the error 
bar on a}"^^^) but can't determine directly. See Fig. 2 for an illustration. In fact we 
show in the text below that this assumption is correct for a linear model (and for 
a non-linear model if the range of parameter values is small enough that it can be 
represented by an effective linear model). Even if the model is non linear, one usually 
assumes that the two standard deviations are sufficiently close that the difference is 
not important. Furthermore, we show in Appendices E and F that if the noise on the 
data is Gaussian (and the model is linear), the two distributions in Fig. (2) are also 
both Gaussian. 

Hence, as stated above, to derive the error bars in the fit parameters we take simulated values 
of the data points, yf , which vary by some amount 5yf about yf'\ i.e. 5yf = yf — yf'\ with a 
standard deviation given by the error bar cjj. The fit parameters of this simulated data set, , 
then deviate from a^^') by an amount 6d^ where 

Averaging over fluctuations in the we get the variance of to be 

since ((5yf )^) = erf, and the data points yi are statistically independent. Writing Eq. (78) explicitly 
in terms of the data values, 

N 

«« = E(f^-\/.E^' (86) 



P i=l 

and noting that U is independent of the yi, we get 



?^ = y ([/-I) ^4- (87) 



Substituting into Eq. (85) gives 



= 7 \ U I „ \ U 

/3,7 



4 = 1 ' 



(88) 



The term in rectangular brackets is just Up^f, and so, noting that U is given by Eq. (76) and is 
symmetric, the last equation reduces to 

{-'S = {U-')a. ■ (89) 
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Recall that is the standard deviation of the fitted parameter values about the a^^^ when con- 
structing simulated data sets from the one set of data that is available to us. However, the error 
bar is defined to be the standard deviation the fitted parameter values would have relative to ajjj'"'^ 
if we could average over many actual data sets. To determine this quantity we simply repeat the 
above calculation with 6yi = Hi — yf^^^ in which yi is the value of the i-th data point in one of the 
actual data sets. The result is identical to Eq. (89), namely 

(90) 

in which U is the same in Eq. (90) as in Eq. (89) because U is a constant, for a linear model, 
independent of the yi or the fit parameters aa- Hence aa in Eq. (90) is the error bar in a^. 

In addition to error bars, we also need a parameter to describe the quality of the fit. A useful 
quantity is the probability that, given the fit, the data could have occurred with a greater than 
or equal to the value found. This is generally denoted by Q and is given by Eq. (C9) assuming 
the data have Gaussian noise. Note that the effects of non- Gaussian statistics is to increase the 
probability of outliers, so fits with a fairly small value of Q, say around 0.01, may be considered 
acceptable. However, fits with a very small value of Q should not be trusted and the values of the 
fit parameters are probably meaningless in these cases. 



For the case of a straight line fit, the inverse of U is given explicitly in Eq. (79). Using this 
information, and the values of {xi,yi,ai) for the data in Fig. 3, the fit parameters (assuming a 
straight line fit) are 

ao = 0.84 ± 0.32, (91) 
ai = 2.05 ±0.11, (92) 

in which the error bars on the fit parameters on oq and oi, which are denoted by ao and ai, are 
determined from Eq. (90). The data was generated by starting with y = 1 + 22; and then adding 
some noise with zero mean. Hence the fit should be consistent with y = l + 2x within the error bars, 
and it is. The value of is 7.44 so x^/^dof = 7.44/9 = 0.866 and the quality of fit parameter, 
given by Eq. (C9), is Q = 0.592 which is good. 

We call the ^^covariance matrix". Its off-diagonal elements are also useful since they 
contain information about correlations between the fitted parameters. More precisely, one can 
show, following the lines of the above derivation of o"^, that the correlation of fit parameters a and 
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FIG. 3: An example of a straight-line fit to a set of data with error bars. 

P, known mathematically as their "covariance" , is given by the appropriate off-diagonal element 
of the covariance matrix, 

Cov(a, /3) = {6aa 5ap) = • (93) 

The correlation coefficient, Tq,^, which is a dimensionless measure of the correlation between 5aa 
and 5a p lying between —1 and 1, is given by 

Cov(a^^ (94) 

A good fitting program should output the correlation coefficients as well as the fit parameters, 
their error bars, the value of X^/-^DOF) and the goodness of fit parameter Q. 

For a linear model, is a quadratic function of the fit parameters and so the elements of 
the ^^curvature matrix^^^, {1/2) d'^y^ /daadap are constants, independent of the values of the fit 
parameters. In fact, we see from Eqs. (76) and (82) that 

1 92 2 

2;^ = ^"«- 



* It is conventional to include the factor of 1/2. 
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so the curvature matrix is equal to U, given by Eq. (76) for a polynomial fit. 
If we fit to a general linear model, writing 

M 



f{x) = 0^X0,(3 



(96) 



a=l 



where Xi{x),X2{x)^ ■ ■ ■ ,XMix) a fixed functions of x called basis functions, the curvature matrix 
is given by 



(97) 




Similarly, the quantities Va in Eq. (77) become 




(98) 



for a general set of basis functions, and best fit parameters are given by the solution of the M 
linear equations 




(99) 



for a = 1, 2, • • • , M. Note that for a linear model the curvature matrix [/ is a constant, independent 
of the fit parameters. However, U is not constant for a non-linear model. 



D. Fitting to a non-linear model 

As for linear models, one minimizes iii Eq. (67). The difference is that the resulting equations 
are non-linear so there might be many solutions or non at all. Techniques for solving the coupled 
non- linear equations invariably require specifying an initial value for the variables a^. The most 
common method for fitting to non-linear models is the Levenberg-Marquardt (LM) method, see 
e.g. Numerical Recipes [1]. Implementing the Numerical Recipes code for LM is a little complicated 
because it requires the user to provide a routine for the derivatives of with respect to the fit 
parameters as well as for itself, and to check for convergence. Alternatively, one can use the 
fitting routines in the scipy package of python or use gnuplot. But see the comments in Appendix 
D about getting the error bars in the parameters correct. This applies when fitting to linear as 
well as non-linear models. Gnuplot and scipy scripts for fitting to a non-linear model are given in 
Appendix G. 
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One difference from fitting to a linear model is that the curvature matrix, defined by the LHS 
of Eq. (95), is not constant but is a function of the fit parameters. Hence it is no longer true that 
the standard deviations of the two distributions in Fig. 2 are equal. However, it still generally 
assumed that the difference is small enough to be unimportant and hence that the covariance 
matrix, which is now defined to be the inverse of the curvature matrix at the minimum of x^i still 
gives information about error bars on the fit parameters. This is discussed more in the next two 
subsections, in which we point out, however, that a more detailed analysis is needed if the model is 
non-linear and the spread of fitted parameters is sufficiently large that it cannot be represented by 
an effective linear model, i.e. is not well fitted by a parabola over the needed range of parameter 
values. 

As a reminder: 

• The curvature matrix is defined in general by the LHS of Eq. (95), which, for a linear model, 
is equivalent to Eq. (97) (Eq. (76) for a polynomial fit.) 

• The covariance matrix is the inverse of the curvature matrix at the minimum of (the 
last remark being only needed for a non-linear model). Its diagonal elements give error bars 
on the fit parameters according to Eq. (90) (but see the caveat in the previous paragraph 
for non-linear models) and its off-diagonal elements give correlations between fit parameters 
according to Eqs. (93) and (94). 



In the last two subsections we showed that the diagonal elements of the covariance matrix give 
an error bar on the fit parameters. In this section we extend the notion of error bar to embrace 
the concept of a "confidence limit". 

There is a theorem [1] which states that, for a linear model, if we take simulated data sets 
assuming Gaussian noise in the data about the actual data points, and compute the fit parameters 
a^(^\i = 1,2,- •• for each data set, then the probability distribution of the is given by the 
multi-variable Gaussian distribution 



E. 



Confidence limits 




(100) 



where 6a = and U, given by Eq. (97), is the curvature matrix which can also be defined 

in terms of the second derivative of according to Eq. (95). A proof of this result is given in 
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Appendix E. It applies for a linear model with Gaussian noise, and also for a non-linear model if 
the uncertainties in the parameters do not extend outside a region where an effective linear model 
could be used. (In the latter case one still needs a non-linear routine to find the best parameters). 
Note that for a non-linear model, U is not a constant and is the curvature at the minimum of x^- 
From Eq. (95) the change in as the parameters are varied away from the minimum is given 

by 

Ax^ - X^(a^«) - X^(a(°)) = E ^^-/^ ^4 ' (lO^) 

in which the a,re all evaluated from the single (actual) data set y^^^ ■ Equation (100) can therefore 
be written as 

P(a^)ocexp(^-^Ax') . (102) 

We remind the reader that we have assumed the noise in the data is Gaussian and that either the 
model is linear or, if non-linear, the uncertainties in the parameters do not extend outside a region 
where an effective linear model could be used. 

Hence the probability of a particular deviation, 6a^ , of the fit parameters in a simulated data 
set away from the parameters in the actual data set, depends on how much this change increases 
(evaluated from the actual data set) away from the minimum. In general a "confidence limit" is the 
range of fit parameter values such that Ax^ is less than some specified value. The simplest case, 
and the only one we discuss here, is the variation of one variable at a time, though multi-variate 
confidence limits can also be defined, see Numerical Recipes [1]. 

We therefore consider the change in x"^ when one variable, af say, is held at a specified value, and 
all the others (/3 = 2, 3, • • • , M) are varied in order to minimize x^- Minimizing Ax^ in Eq. (101) 
with respect to gives 

M 

Y,Up^^a^ = 0, (/3 = 2,3,--- ,M). (103) 

7=1 

The corresponding sum for /3 = 1, namely Uij6a^, is not zero because 6ai is fixed. It will 

be some number, c say. Hence we can write 

M 

Y,Ua^Sa^ = Ca, (a = 1,2,--- ,M), (104) 

7=1 

where ci = c and = (/3 7^ 1). The solution is 

M 

<5af = ^(C/-I)„^c;3. (105) 

/3=1 
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For a = 1 this gives 



c = 5af/{U~')^^ . (106) 
Substituting Eq. (105) into Eq. (101), and using Eq. (106) we find that Ax^ is related to ((5af)^ 

by 



Ax' 



2_ i^aff 



(c/-i; 



(107) 
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(Curiously, the coefficient of ((5ai)^ is one over the 11 element of the inverse of U, rather than Uu 
which is how it appears in Eq. (101) in which the /3 7^ 1 parameters are free rather than adjusted 
to minimize x^-) 

From Eq. (102) we finally get 

1 (5af^2^ 



P{ai ) oc exp 



2 al 



(108) 



where 



U~ 



'11 



(109) 



As shown in Appendices E and F, Eqs. (100), (102) and (108) also apply, under the same 
conditions (linear model and Gaussian noise on the data) to the probability for 6ai = 
where we remind the reader that af'^ is the fit parameter obtained from the actual data, and a^^^'^ 
is the exact value. In other words the probability of the true value is given by 



P(a*''^^) oc exp ( -^^X^ 



(110) 



where 



(111) 



in which we remind the reader that both values of are evaluated from the single set of data 
available to us, y.^''. Projecting onto a single parameter, as above, gives 



P(a*i™^) oc exp 



1 {Saif 
2 



(112) 



so {{6ai] 



(jf = (f/ ^)^]^, in agreement with what we found earlier in Eq. (90). We emphasize 



that Eqs. (110) and (112) assumes Gaussian noise on the data points, and either the model is 
linear or, if non-linear, that the range of uncertainty in the parameters is small enough that a 
description in terms of an effective linear model is satisfactory. 
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FIG. 4: Left: The change in ^ as a fit parameter a\ is varied away from the value that minimizes for 
a linear modeL The shape is a parabola for which Ax^ = 1 when 8a = ±(Ti, where a\ is the error bar. 
Right: The solid curve is a sketch of the change in for a non-linear model. The curve is no longer 
a parabola and is even non-symmetric. The dashed curve is a parabola which fits the solid curve at the 
minimum. The fitting program only has information about the local behavior at the minimum and so gives 
an error range ±cti where the value of the parabola is 1. However, the parameter a\ is clearly more tightly 
constrained on the plus side than on the minus side, and a better way to determine the error range is to 
look globally and locate the values of Sai where Ax^ = 1- This gives an error bar on the plus side, and 
a different error bar, ctj^, on the minus side, both of which are different from cti. 

However we have done more than recover our earlier result, Eq. (90), by more complicated means 
since we have gained additional information. From the properties of a Gaussian distribution we 
now know that, from Eq. (112), the probability that aa lies within one standard deviation cJq, of 
the value which minimizes is 68%, the probability of its being within two standard deviations 
is 95.5%, and so on. Furthermore, from Eq. (110), we see that 

if a single fit parameter is one standard deviation away from its value at the minimum 
of (ihe other fit parameters being varied to minimize x^), then Ax^ = 1- 

This last sentence, and the corresponding equations Eqs. (110) and (112), are not valid for a 
non-linear model if the uncertainties of the parameters extends outside the range where an effective 
linear model can be used. In this situation, to get confidence limits, is is necessary to do a bootstrap 
resampling of the data, as discussed in the next subsection. 
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However, if one is not able to resample the data we argue that it is better to take the range 
where < 1 as an error bar for each parameter rather than the error bar determined from the 
curvature of at the minimum, see Fig. 4. The left hand plot is for a linear model, for which the 
curve of Ax^ against 5ai is exactly a parabola, and the right hand plot is a sketch for a non-linear 
model, for which it is not a parabola though it has a quadratic variation about the minimum shown 
by the dashed curve. For the linear case, the values of 6ai where Ax^ = 1 are the same as the 
values ±0"!, where cJi is the standard error bar obtained from the local curvature in the vicinity of 
the minimum. However, for the non-linear case, the values of 6ai where Ax^ = 1 are different from 
±0"!, and indeed the values on the positive and negative sides, af and , are not equal. For the 
data Fig. 4, it is clear that the value of ai is more tightly constrained on the positive side than the 
negative side, and so it is better to give the error bars as +(^1 and — obtained from the range 
where Ax^ < 1, rather the symmetric range itcji. However, if possible, in these circumstances 
error bars and a confidence limit should actually be obtained from a bootstrap resampling of the 
data as discussed in the next section. 



F. Confidence limits by resampling the data 

More work is involved if one wants to get error bars and a confidence interval in the case where 
the model is non-linear and the range of parameter uncertainty extends outside the region where 
an effective linear model is adequate. Even for a linear model, we cannot convert Ax^ into a 
confidence limit with a specific probability if the noise is non-Gaussian. 

To proceed in these cases, one can bootstrap the individual data points as follows. Each data 
point {xi,yi) has error bar cjj, which comes from averaging over N measurements, say. Generating 
bootstrap datasets by Monte Carlo sampling the N measurements, as discussed in Sec. HB3, the 
distribution of the mean of each bootstrap dataset has a standard deviation equal to the estimate 
of standard deviation on the mean of the actual data set, see Eq. (63) (replacing the factor of 
■\JN/[N — 1) by unity which is valid since is large in practice). Hence, if we generate A'boot 
bootstrap data sets, and fit each one, the scatter of the fitted parameter values will be a measure 
of the uncertainty in the values from the actual dataset. Forming a histogram of the values of 
a single parameter we can obtain a confidence interval within which 68%, say, of the bootstrap 
datasets lie (16% missing on either side) and interpret this range as a 68% confidence limit for the 
actual parameter value. The justification for this interpretation has been discussed in the statistics 
literature, see e.g. the references in Ref. [1], but Fm not able to go into the details here. Note 
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that this bootstrap approach could also be applied usefully for a linear model if the noise is not 
Gaussian. 

Unfortunately, use of the bootstrap procedure to get error bars in fits to non-linear models does 
not yet seem to be a standard procedure in the statistical physics community. 

Another possibility for a non-linear model, if one is confident that the noise is close to Gaus- 
sian, is to generate simulated data sets, assuming Gaussian noise on the yi values with standard 
deviation given by the error bars cjj. Each simulated dataset is fitted and the distribution of fitted 
parameters is determined. This corresponds to the analytical approach in Appendix E but without 
the assumption that the model can be represented by an effective linear one over of the needed 
parameter range. 



G. A tale of two probabilities. When can one rule out a fit? 

If the noise on the data is Gaussian, which we will assume throughout this subsection, we have, 
so far, considered two different probabilities. 

Firstly, as discussed in Appendix C, the value of is typically in the range A^'dof i ^/2MooF■ 
The quality of fit parameter Q is the probability that, given the fit, the data could have 
this value of or greater, and is given mathematically by Eq. (C9). It varies from unity 
when ^ A^DOF — V^A^DOF to zero when x^ ^ A^dqf + \/2A^dof- We emphasize that 



Q is the probability of the data given the fit. 



Secondly, in the context of error bars and confidence limits, we have discussed, in Eqs. (110) 
and (112), the probability that a fit parameter, ai say, takes a certain value relative to the optimal 
one. Equation (110) becomes very small when A^^ varies by much more than unity. Note that 



Eqs. (110) and (112) refer to the relative probabilities of two fits, given the data. 



At first, it seems curious that the probability Q remains significantly greater than zero if x^ 
changes by an amount of order V^DOFi whereas if a fit parameter is changed by an amount 
such that x^ changes by of order \/ A^dof ) the probability of this value becomes extremely small, 
of order exp(— const. \/Adof)) in this limit, see Eqs. (110). While there is no mathemematical 
inconsistency, since the two probabilities refer to different situations (one is the probability of the 
data given the fit and the other is the relative probability of two fits given the data), it is useful 
to understand this difference intuitively. 
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FIG. 5: Left: A straight-line fit to a data set. The value of Q is reasonable. However, one notices that the 
data is systematically above the fit for small x and for large x while it is below the fit for intermediate x. 
This is unlikely to happen by random chance. This remark is made more precise in the right figure. 
Right: A parabolic fit to the same data set. The value of Q is larger than for the straight-line fit, but the 
main result is that the coefficient of the quadratic term is 5 cr away from zero, showing that the straight-line 
fit in the left figure is much less likely than the parabolic fit. 

We take, as an example, a problem where we want to know whether the data can be modeled 
by a straight line, or whether a quadratic term needs to be included as well. A set of data is shown 
in Fig. 5. 

Looking at the left figure one sees that the data more or less agrees with the straight-line fit. 
However, one also sees systematic trends: the data is too high for small x and for high x, and too 
low for intermediate x. 



The probability that this trend would occur by chance is very low. 



Chi- 



squared just sums up the contributions from each data point and is insenstive to any systematic 
trend in the deviation of the data from the fit. Hence the value of in itself, does not tell 
us that this data is unlikely to be represented by a straight line. It is only when we add another 
parameter in the fit which corresponds to those correlations, that we realize the straight-line model 
is relatively very unlikely. In this case, the extra parameter is the coefficient of x^, and the resulting 
parabolic fit is shown in the right figure. 

The qualitative comments in the last paragraph are made more precise by the parameters of 
the fits. The straight-line fit gives oq = 0.59 ±0.26, ai = 2.003 ±0.022 with Q = 0.124, whereas the 
parabolic fit gives ao = 2.04 ± 0.40, oi = 1.588 ± 0.090, 02 = 0.0203 ± 0.0042 with Q = 0.924. The 
actual parameters used to generate the data are ao = 2,ai = 1.6,02 = 0.02, and there is Gaussian 
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noise with standard deviation equal to 0.8. Although the quality of fit factor for the straight-line 
fit is reasonable, the quadratic fit strongly excludes having the fit parameter 02 equal to zero, 
since zero is five standard deviations away from the best value. For a Gaussian distribution, the 
probability of a five-sigma deviation or greater is erfc(5/-v/2) — 6 x 10~^. The difference in for 
the quadratic fit, between the best fit and the fit forcing 02 = 0, is (0.0203/0.0042)^ ~ 23 according 
to Eqs. (107) and (90). We conclude that, in this case, the straight-line model is unlikely to be 
correct. 

The moral of this tale is that a reasonable value of Q does not, in itself, ensure that you have 
the right model. Another model might be very much more probable. 



Appendix A: Central Limit Theorem 

In this appendix we give a proof of the central limit theorem. 

We assume a distribution that falls off sufficiently fast at ^00 that the mean and variance are 
finite. This excludes, for example, the Lorentzian distribution: 

^Lor = -T^. (Al) 

t: 1 + x'^ 

A common distribution which does have a finite mean and variance is the Gaussian distribution, 

(rr-M)2n 



-PCauss = , exp 

v27r a 



2.^ ■ 

Using standard results for Gaussian integrals you should be able to show that the distribution is 
normalized and that the mean and standard deviation are /i and a respectively. 

Consider a distribution, not necessarily Gaussian, with a finite mean and distribution. We pick 

N random numbers Xi from such a distribution and form the sum 

N 

i=l 

Note, we are assuming that all the random numbers have the same distribution. 

The determination of the distribution of X, which we call Pn{X), uses the Fourier transform 
of P{x), called the "characteristic function" in the context of probability theory. This is defined 
by 



/oo 
P(x)e^^^ dx. 
-00 



Expanding out the exponential we can write Q{k) in terms of the moments of P{x) 

Q{k) = l + ^k{x) + ^^{x')+^-^{x') + .... 
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It will be convenient in what follows to write Q{k) as an exponential, i.e. 
Q{k) = exp 



In(l + .fc(x) + ^(x2) + i!^(x3) + 



2! 



3! 



exp 



ikfi 



2! 



3! 



4! 



(A3) 



where C3 involves third and lower moments, C4 involves fourth and lower moments, and so on. The 
Cn are called cumulant averages. 

For the important case of a Gaussian, the Fourier transform is obtained by "completing the 
square". The result is that the Fourier transform of a Gaussian is also a Gaussian, namely, 



QGauss(^) = exp 






ik^i 2 


5 



(A4) 



showing that the higher order cumulants, 03,04, etc. in Eq. (A3) all vanish for a Gaussian. 
The distribution P/v(3;) can be expressed as 

/oo poo poo 

P{xi)dxi / P{x2)dx2--- P{xN)dXNS{X -^Xi). (A5) 

-00 J —00 J —00 

We evaluate this by using the integral representation of the delta function 

S{x) 



i=l 



2tt 



6^^=^ dk . 



(A6) 



so 



/oo />oo roo POO 
— / P{xi)dxi / P{x2)dx2--- / P{xN)dxN ex.p[ik{xi + X2 -\ xn - X)] 
-00 J —00 J —00 J —00 

(A7) 



dk 
2^ 



QikY'e 



N-ikX 



showing that the Fourier transform of Pi\f{x), which we call QN{k), is given by 



QN{k) = Q{k)^ . 



Consequently 



QN{k) = exp 



ikNn — + '^r^ + ^1-^ + 



4! 



4! 



(A8) 



(A9) 



(AlO) 



Comparing with Eq. (A3) we see that 



the mean of the distribution of the sum of A'^ independent and identically distributed 
random variables (the coefficient of —ik in the exponential) is N times the mean of 
the distribution of one variable, and the variance of the distribution of the sum (the 
coefficient of —k'^/2\) is N times the variance of the distribution of one variable. 
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These are general statements applicable for any N and have already been derived in Sec. II A. 

However, if is large we can now go further. The distribution Pfyf{X) is the inverse transform 
of QN{k), see Eq. (A8), so 



Pn{X) 



1 

2^ 



exp 



-ikX' 



+ N^^—!- H \ ^ + 



2! 



3! 



4! 



dk . 



where 



X' = X -Nfi. 



(All) 



(A12) 



Looking at the —Nk'^/2 term in the exponential in Eq. (All), we see that the integrand is significant 
for k < k* , where Na'^ik*)'^ = 1, and negligibly small for k ^ k* . However, for Q < k < k* the 
higher order terms in Eq. (All), (i.e. those of order k^,k'^ etc.) are very small since N{k*')^ ~ 
N^^^"^ , N{k*)^ ~ N~'^ and so on. Hence the terms of higher order than k'^ in Eq. (All), do not 
contribute for large N and so 

Nk^a^- 



1 f°° 

lim Pn{X) = — exp 



-ikX' 



dk. 



(A13) 



In other words, for large the distribution is the Fourier transform of a Gaussian, which, as we 
know, is also a Gaussian. Completing the square in Eq. (A13) gives 



lim Pn{X) 

N^oo 



1 

2^ 



exp 



k 



iX' 



dk exp 



l\2 



1 



V2irN a 



exp 



(A14) 



where, in the last line, we used Eq. (A12). This is a Gaussian with mean A^/x and variance Na'^ 



Equation (A14) is the central limit theorem in statistics. It tells us that. 



for A^ — 7- oo, the distribution of the sum of A^ independent and identically distributed 
random variables is a Gaussian whose mean is A^ times the mean, /x, of the distribution 
of one variable, and whose variance is A^ times the variance of the distribution of one 
variable, o"^, independent of the form of the distribution of one variable., P{x), provided 
only that /i and a are finite. 

The central limit theorem is of such generality that it is extremely important. It is the reason 
why the Gaussian distribution has such a preeminent place in the theory of statistics. 

Note that if the distribution of the individual Xi is Gaussian, then the distribution of the sum 
of A^ variables is always Gaussian, even for A^ small. This follows from Eq. (A9) and the fact that 
the Fourier transform of a Gaussian is a Gaussian. 
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FIG. 6: Figure showing the approach to the central hmit theorem for the distribution in Eq. (A15), which 
has mean, equal to 0, and standard deviation, cr, equal to 1. The horizontal axis is the sum of N random 
variables divided by \/N which, for all N, has zero mean and standard deviation unity. For large N the 
distribution approaches a Gaussian. However, convergence is non-uniform, and is extremely slow in the 
tails. 

In practice, distributions that we meet in nature, have a much broader tail than that of the 
Gaussian distribution, which falls off very fast at large \x — fi\/a. As a result, even if the distribution 
of the sum approximates well a Gaussian in the central region for only modest values of iV, it 
might take a much larger value of to beat down the weight in the tail to the value of the 
Gaussian. Hence, even for moderate values of A^, the probability of a deviation greater than a can 
be significantly larger than that of the Gaussian distribution which is 32%. This caution will be 
important in Sec. Ill when we discuss the quality of fits. 



We will illustrate the slow convergence of the distribution of the sum to a Gaussian in Fig. (6), in 
which the distribution of the individual variables Xi is 
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This has mean and standard deviation 1, but moments higher than the second do not exist 
because the integrals diverge. For large N the distribution approaches a Gaussian, as expected, 
but convergence is very slow in the tails. 



Appendix B: The number of degrees of freedom 

Consider, for simplicity, a straight line fit, so we have to determine the values of ao and oi which 
minimize Eq. (73). The N terms in Eq. (73) are not statistically independent at the minimum 
because the values of ao and ai, given by Eq. (74), depend on the data points (xj,yj,crj). 

Consider the "residuals" defined by 

€i = . (Bl) 

a 

If the model were exact and we use the exact values of the parameters oq and ai the would be 
independent and each have a Gaussian distribution with zero mean and standard deviation unity. 

However, choosing the best-fit values of ao and ai from the data according to Eq. (74) implies 
that 

N 



V - = , (B2a) 



O". 
i=l 

N 

X 



V^e. = 0, (B2b) 



■ 1 



which are are two linear constraints on the ej. This means that we only need to specify — 2 of 
them to know them all. In the N dimensional space of the we have eliminated two directions, 
so there can be no Gaussian fluctuations along them. However the other N — 2 dimensions are 
unchanged, and will have the same Gaussian fluctuations as before. Thus has the distribution of 
a sum of squares of A^ — 2 Gaussian random variables. We can intuitively understand why there are 
N — 2 degrees of freedom rather than N by considering the case of = 2. The fit goes perfectly 
through the two points so one has = exactly. This implies that there are zero degrees of 
freedom since, on average, each degree of freedom adds 1 to x^- 

Clearly this argument can be generalized to any fitting function which depends linearly on M 
fitting parameters, with the result that has the distribution of a sum of squares of A'^dof = 
N — M Gaussian random variables, in which the quantity A'^dof is called the "number of degrees 
of freedom" . 

Even if the fitting function depends non-linearly on the parameters, this last result is often 
taken as a reasonable approximation. 
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Appendix C: The chi-squared distribution and the goodness of fit parameter Q 

The distribution for m degrees of freedom is the distribution of the sum of m independent 
random variables with a Gaussian distribution with zero mean and standard deviation unity. To 
determine this we write the distribution of the m variables Xi as 

P(xi,X2,--- ,Xm)dxidx2---dxm = ^2^)m/2 ^'""'^^ ^"''"'^^ ' ' ' ^"^^^^ dxidx2 ■ ■ ■ dx.^ . (CI) 

Converting to polar coordinates, and integrating over directions, we find the distribution of the 
radial variable to be 

P(r) dr = e-""'^ dr , (C2) 

(27r)W2 ' ^ ^ 

where Sm is the surface area of a unit m-dimensional sphere. To determine Sm we integrate 

Eq. (C2) over r, noting that P{r) is normalized, which gives 

= f(W2) ' ^^^^ 
where T(x) is the Euler gamma function defined by 

POO 

r{x)= I e-^e-'dt. (C4) 

From Eqs. (C2) and (C3) we have 







^ > 2™/2-ir(?n/2) ^ ' 

This is the distribution of r but we want the distribution of = Yli xf = r"^. To avoid confusion 
of notation we write X for , and define the distribution for m variables as (X) . We have 
p(™)(X) dX = P{r) dr so the distribution for m degrees of freedom is 

P(r) 



dX/dr 



The distribution is zero for X < 0. Using Eq. (C4) and the property of the gamma function 
that r(n + 1) = nT{n) one can show that 

;>oo 

/ = 1, (C7a) 

Jo 

/•CO 

{X)= X (X) dX = m, (C7b) 
Jo 

l-OO 

{X^)= / X^ [X] dX = + 2m , so (C7c) 
(^2) - = 2m. (C7d) 



41 



From Eqs. (C7b) and (C7d) we see that typically lies in the range m — \/2m to m + \/2m. 
For large m the distribution approaches a Gaussian according to the central limit theory discussed 
in Appendix A. Typically one focuses on the value of per degree freedom since this should be 
around unity independent of m. 

The goodness of fit parameter is the probability that the specified value of x^, or greater, could 
occur by random chance. From Eq. (C6) it is given by 

2W2r(m/2) 42 ^ ^ 

r(m/2) 7^2/2 ^ 

which is known as an incomplete gamma function. Code to generate the incomplete gamma function 
is given in Numerical Recipes [1]. There is also a built-in function to generate the goodness of fit 
parameter in the scipy package of python and in the graphics program gnuplot, see the scripts 
in Appendix G. 

Note that Q = 1 for = and Q — )■ for x^ — co- Remember that m is the number of 
degrees of freedom, written as A^dof elsewhere in these notes. 



Appendix D: Asymptotic standard error and how to get correct error bars from gnuplot 



Sometimes one does not have error bars on the data. Nonetheless, one can still use x^ fitting to 
get an estimate of those errors (assuming that they are all equal) and thereby also get an error bar 
on the fit parameters. The latter is called the "asymptotic standard error". Assuming the same 
error bar a^ss for all points, WG determine c ^ss 

from the requirement that x^ per degree of freedom 
is precisely one, i.e. its mean value according to Eq. (C7b). This gives 

,2 



1 



X 



Nbof A^dof ^ 



-E 



^ ' Vi- f{xi) ^ ^ 



(Dl) 



or, equivalently, 



N 



A-DOI 



i=l 



(D2) 



The error bars on the fit parameters are then obtained from Eq. (90), with the elements of U given 
by Eq. (76) in which ui is replaced by cJass- Equivalently, one can set the ai to unity in determining 
U from Eq. (76), and estimate the error on the fit parameters from 



aa "ass ' 



(asymptotic standard error). 



(D3) 
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A simple example of the use of the asymptotic standard error in a situation where we don't 
know the error on the data points, is fitting to a constant, i.e. determining the average of a set of 
data, which we already discussed in detail in Sec. II. In this case we have 

TV 

Uoo = N, vo = Y,yu (D4) 

i=l 

SO the only fit parameter is 

N 

Vq i 

ao 

IJnn 1\ 

1=1 

which gives, naturally enough, the average of the data points, y. The number of degrees of freedom 
is — 1, since there is one fit parameter, so 

N 



1 " 

I^ = Lyy,=y, (D5) 



2 _ 

'^ass 

i=l 



^ E (y- yf , (D6) 



1 1 ^ _2 



and hence the square of the error on ao is given, from Eq. (D3), by 

J_ 2 ^ \ 

Uoo N{N , 

which is precisely the expression for the error in the mean of a set of data given in Eq. (16). 

I now mention that a popular plotting program, gnuplot, which also does fits, presents error 
bars on the fit parameters incorrectly if there are error bars on the data. Whether or not there are 
error bars on the points, gnuplot presents the "asymptotic standard error" on the fit parameters. 
Gnuplot calculates the elements of U correctly from Eq. (76) including the error bars, but then 
apparently also determines an "assumed error" from an expression like Eq. (D2) but including the 
error bars, i.e. 

2 1 f yi- fi^i) V 



i=l 

^2 



Hence gnuplot 's a^^^ is just the chi-squared per degree of freedom. The error bar (squared) quoted 
by gnuplot is (f/)"^^ o"^ss, as in Eq. (D3). However, this is wrong since the error bars on the data 
points have already been included in calculating the elements of U , so the error on the fit parameter 
a should be {U)'^l^. Hence, 

to get correct error bars on fit parameters from gnuplot when there are error bars 
on the points, you have to divide gnuplot 's asymptotic standard errors by the square 
root of the chi-squared per degree of freedom (which gnuplot calls FIT_STDFIT and, 
fortunately, computes correctly). 
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I have checked this statement by comparing with results for Numerical Recipes routines, and also, 
for straight-line fits, by my own implementation of the formulae. It is curious that I found no 
hits on this topic when googling the internet. Can no one else have come across this problem? 
Correction of gnuplot error bars is implemented in the gnuplot scripts in Appendix G 

The need to correct gnuplot 's error bars applies to linear as well as non-linear models. 

I recently learned that error bars on fit parameters given by the routine curveJ it of python 
also have to be corrected in the same way. This is shown in two of the python scripts in appendix G. 
Curiously, a different python fitting routine, leastsq, gives the error bars correctly. 



Appendix E: The distribution of fitted parameters determined from simulated datasets 



In this section we derive the equation for the distribution of fitted parameters determined from 
simulated datasets, Eq. (100), assuming an arbitrary linear model, see Eq. (96). Projecting on to 
a single fitting parameter, as above, this corresponds to the lower figure in Fig. 2. 

We have one set of y- values, yf^\ for which the fit parameters are a^^\ We then generate an 
ensemble of simulated data sets, yf, assuming the data has Gaussian noise with standard deviation 
CTj centered on the actual data values yf'^. We ask for the probability that the fit to one of the 
simulated data sets has parameters . 

This probability distribution is given by 



N 



p{s') = n 



1 



27ro",- 



dyi exp 



2a? 



M 



det U , (El) 



a=l 



where the factor in curly brackets is (an integral over) the probability distribution of the data 
points , and the delta functions project out those sets of data points which have a particular set 
of fitted parameters, see Eq. (99). The factor of det ^7 is a Jacobian to normalize the distribution. 
Using the integral representation of the delta function, and writing explicitly the expression for Va 
from Eq. (98), one has 



dyi exp 




dka exp 




ika j ^ Uapa"^ 

/3 
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We carry out the y integrals by "completing the square", 



M 



N 



dyi exp 



yf - yf^ + ik • X{xi 
2^? 



exp 



2a 



(k-X{x.)) yf) 


X exp 


i ^ ka Ual3 









X (E4) 



detU. (E5) 



Doing the y^-integrals, the factors in curly brackets are equal to unity. Using Eqs. (97) and (98) 
and the fact that the are independent of the yf , we then get 

I — / dk„, 1 exn — ^ 



a, 13 



detU , 



where 



with 



r S _ 5 (0) 



(0) ^ Vj ^gja^tj 



1=1 



We do the ^-integrals by working in the basis in which U is diagonal. The result is 



"(a ) = —Tz — : — — exp 



(27r)W2 



a,/3 



Using Eq. (99) and the fact that U is symmetric we get our final result 




(E6) 



(E7) 



(E8) 



(E9) 



(ElO) 



which is Eq. (100), including the normalization constant in front of the exponential. 



Appendix F: The distribution of fitted parameters from repeated sets of measurements 

In this section we derive the equation for the distribution of fitted parameters determined in 
the hypothetical situation that one has many actual data sets. Projecting on to a single fitted 
parameter, this corresponds to the upper figure in Fig. 2. 

The exact value of the data is y*^"*^ = o*'^"" • X{xi), and the distribution of the y^ in an actual 
data set, which differs from y^^^ because of noise, has a distribution, assumed Gaussian here. 
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centered on with standard deviation (jj. Fitting each of these real data sets, the probabiUty 
distribution for the fitted parameters is given by 



N 



p{a) = n 



1 



i=l 



2'irai 



dm exp 



Vi - a"'^^ ■ X{xi) 
2^1 



M 



Yi^ t^Q/3a/3 -Va\ detU , 

(Fl) 



see Eq. (El) for an explanation of the various factors. Proceeding as in Appendix E we have 
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and doing the y- integrals by completing the square gives 
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Using Eq. (97) we then get 



f(«)=n(^/°l'i*")»p 

a=l ^ ^ 



a,/3 



Q,/3 



detC/. 



(F6) 



where 



(5a^ = ap- a|"° . 



The fc-integrals are done by working in the basis in which U is diagonal. The result is 



(F7) 




(F8) 



In other words, the distribution of the fitted parameters obtained from many sets of actual data, 
about the true value c^"^^^ is a Gaussian. Since we are assuming a linear model, the matrix of 
coefficients Uap is a constant, and so the distribution in Eq. (F8) is the same as in Eq. (ElO). 
Hence 
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For a linear model with Gaussian noise, the distribution of fitted parameters, obtained 
from simulated data sets, relative to value from the one actual data set, is the same as 
the distribution of parameters from many actual data sets relative to the true value, 
see Fig. 2. 

This result is also valid for a non-linear model if the range of parameter values needed is sufficiently 
small that the model can be represented by an effective one. It is usually assumed to be a reasonable 
approximation even if this condition is not fulfilled. 

Appendix G: Scripts for some data analysis and fitting tasks 

In this appendix I give sample scripts using perl, python and gnuplot for some basic data 
analysis and fitting tasks. I include output from the scripts when acting on certain datasets which 
are available on the web. 

Note "this_f ile_naine" refers to the name of the script being displayed (whatever you choose 
to call it.) 

1. Scripts for a jackknife analysis 

The script reads in values of x on successive lines of the input file and computes (a;^)/(x^)^, 
including an error bar computed using the jackknife method. 

a. Perl 

# ! /usr/bin/perl 
# 

# Usage: "this_f ile_naine data_file" 

# (make the script executable; otherwise you have to preface the command with "perl") 
# 

$n = 0; 

$x2_tot = 0; $x4_tot = 0; 
# 

# read in the data 
# 

while (<>) # Note this very convenient perl command which reads each line of 
# of each input file in the command line 

{ 

@line = split; 
$x2[$n] = $line[0]**2; 
$x4[$n] = $x2[$n]**2; 
$x2_tot += $x2 [$n] ; 
$x4_tot+= $x4 [$n] ; 
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$n++; 

} 
# 

# Do the jackknife estimates 
# 

for ($i =0; $i < $n; $i++) 
{ 

$x2_jack[$i] = ($x2_tot - $x2[$i]) / ($ii - 1); 
$x4_jack[$i] = ($x4_tot - $x4[$i]) / ($n - 1); 

} 

$x2_av = $x2_tot / $n; # Do the overall averages 
$x4_av = $x4_tot / $n; 
$g_av = $x4_av / $x2_av**2; 

$g_jack_av = 0; $g_jack_err =0; # Do the final jackknife estimate 

for ($i =0; $i < $n; $i++) 

{ 

$dg = $x4_jack[$i] / $x2_jack[$i] **2; 
$g_jack_av += $dg; 
$g_jack_err += $dg**2; 

} 

$g_jack_av /= $n; 
$g_jack_err /= $n; 

$g_jack_err = sqrt(($n - 1) * abs($g_jack_err - $g_jack_av**2) ) ; 
printf " Overall average is °/„8.4f\n", $g_av; 

printf " Jackknife average is °/o8.4f +/- y„6.4f \n" , $g_jack_av, $g_jack_err; 

Executing this file on the data in http : //physics .ucsc . edu/~peter/bad-honnef /data.HW2 gives 

Overall average is 1.8215 

Jackknife average is 1.8215 +/- 0.0368 



b. Python 



# 

# Program written by Matt Wittmann 
# 

# Usage: "python this_f ile_name data_file" 
# 

import fileinput 
from math import * 



x2 = [] ; x2_tot = 0. 
x4 = [] ; x4_tot = 0. 

for line in fileinput . input () : # read in each line in each input file. 

# similar to perl's while (<>) 

line = line. split 
x2_i = float (line [0] )**2 
x4_i = x2_i**2 
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x2 . append (x2_i) 
x4 . append ( x4_ i ) 
x2_tot += x2_i 
x4_tot += x4_i 



# put x2_i as the i-th element in an array x2 



n = len(x2) 
# 



# the number of lines read in 



# Do the jackknife estimates 
# 

x2_jack = [] 
x4_jack = [] 
for i in xrange(n): 

x2_ jack. append ( (x2_tot - x2 [i] ) / (n - 1)) 

x4_ jack. append ((x4_tot - x4[i]) / (n - 1)) 

x2_av = x2_tot / n # do the overall averages 

x4_av = x4_tot / n 
g_av = x4_av / x2_av**2 
g_jack_av =0.; g_jack_err = 0. 

for i in xrange(n): # do the final jackknife averages 

dg = x4_jack[i] / x2_jack [i] **2 
g_jack_av += dg 
g_jack_err += dg**2 

g_jack_av /= n 
g_jack_err /= n 

g_jack_err = sqrt((n - 1) * abs(g_jack_err - g_jack_av**2) ) 
print " Overall average is °/o8.4f" % g_av 

print " Jackknife average is 7o8.4f +/- °/o6.4f" % (g_jack_av, g_jack_err) 

The output is the same as for the perl script. 



# ! /usr/bin/perl 
# 

# Usage: "this_f ile_name data_file" 

# (make the script executable; otherwise preface the command with "perl") 



# Does a straight line fit to data in "data_file" each line of which contains 

# data for one point, x_i, y_i, sigma_i 



2. Scripts for a straight-line fit 



a. Perl, writing out the formulae by hand 



# 



# 

$n = 0; 
while (<>) 
{ 



# read in the lines of data 



Oline 



split; # split the line to get x_i, y_i, sigma_i 
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$x[$n] = $line [0] ; 
$y [$n] = $line [1] ; 
$err[$n] = $line[2] ; 

$err2 = $err [$n] **2 ; # compute the necessary sums over the data 

$s += 1 / $err2; 

$sumx += $x[$n] / $err2 ; 

$sumy += $y[$n] / $err2 ; 

$sumxx += $x [$n] *$x [$n] / $err2 ; 

$sumxy += $x[$n]*$y[$n] / $err2 ; 

$n++; 

} 

$delta = $s * $sumxx - $sumx * $sumx ; # compute the slope and intercept 
$c = ($sumy * $sumxx - $sumx * $sumxy) / $delta ; 
$m = ($s * $sumxy - $sumx * $sumy) / $delta ; 
$errm = sqrt($s / $delta) ; 
$errc = sqrt($sumxx / $delta) ; 

printf ("slope = yol0.4f +/- 7o7.4f \n" , $m, $errm) ; # print the results 
printf ("intercept = y„10.4f +/- 7o7.4f \n\n" , $c, $errc) ; 

$NDF = $n - 2; # the no. of degrees of freedom is n - no. of fit params 
$chisq =0; # compute the chi-squared 
for ($i =0; $i < $n; $i++) 
{ 

$chisq += (($y[$i] - $m*$x[$i] - $c)/$err [$i] )**2; 

} 

$chisq /= $NDF; 

printf ("chi squared / NDF = 7o7.41f \n" , $chisq) ; 

Acting with this script on the data in http : //physics .ucsc . edu/~peter/bad-honnef /data.HW3 

gives 

slope = 5.0022 +/- 0.0024 

intercept = 0.9046 +/- 0.2839 

chi squared / NDF = 1.0400 

b. Python, writing out the formulae by hand 

# 

# Program written by Matt Wittmann 
# 

# Usage: "python this_f ile_name data_file" 
# 

# Does a straight-line fit to data in "data_file", each line of which contains 

# the data for one point, x_i, y_i, sigma_i 
# 

import fileinput 



from math import * 



X = [] 

y = [] 

err = [] 

s = sumx = sumy = sumxx = sumxy = 0. 



for line in fileinput . input () : # read in the data, one line at a time 
line = line. split () # split the line 

x_i = float (line [0] ) ; x . append(x_i) 
y_i = float (line [1] ) ; y . append(y_i) 
err_i = float (line [2] ) ; err . append (err_i) 
err2 = err_i**2 

s += 1 / err2 # do the necessary sums over data points 

sumx += x_i / err2 
sumy += y_i / err2 
sumxx += x_i*x_i / err2 
sumxy += x_i*y_i / err2 

n = len(x) # n is the number of data points 

delta = s * sumxx - sumx * sumx # compute the slope and intercept 
c = (sumy * sumxx - sumx * sumxy) / delta 
m = (s * sumxy - sumx * sumy) / delta 
errm = sqrt(s / delta) 
errc = sqrt (sumxx / delta) 

print "slope = %10.4f +/- 7.7. 4f " % (m, errm) 
print "intercept = 7,10. 4f +/- 7.7. 4f \n" % (c, errc) 

NDF = n - 2 # the number of degrees of freedom is n - 2 

chisq = 0. 

for i in xrange(n): # compute chi-squared 

chisq += ((y[i] - m*x[i] - c) /err [i] ) **2 ; 

chisq /= NDF 

print "chi squared / NDF = 7.7. 41f " 7. chisq 



The results are identical to those from the perl script. 



c. Python, using a built-in routine from scipy 

# 

# Python program written by Matt Wittmann 
# 

# Usage: "python this_f ile_naine data_file" 
# 

# Does a straight-line fit to data in "data_file", each line of which cont 

# the data for one point, x_i, y_i, sigma_i. 
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# 

# Uses the built-in routine "curve_fit" in the scipy package. Note that this 

# requires the error bars to be corrected, as with gnuplot 
# 

from pylab import * 

from scipy . optimize import curve_fit 

fname = sys.argv[l] if len(sys . argv) > 1 else 'data.txt' 

X, y, err = np . loadtxt (fname , unpack=True) # read in the data 

n = len(x) 

pO = [5., 0.1] # initial values of parameters 

f = lambda x, c, m: c + m*x # define the function to be fitted 

# note python's lambda notation 
p, covm = curve_fit(f, x, y, pO, err) # do the fit 

c, m = p 

chisq = sum(((f(x, c, m) - y)/err)**2) # compute the chi-squared 

chisq /= n - 2 # divide by no. of DOF 

errc, errm = sqrt (diag (covm) /chisq) # correct the error bars 

print "slope = 7.10. 4f +/- y„7.4f " % (m, errm) 
print "intercept = 7.10. 4f +/- 7.7. 4f \n" 7. (c, errc) 
print "chi squared / NDF = 7.7. 41f " 7, chisq 



The results are identical to those from the above scripts. 



d. Gnuplot 

# 

# Gnuplot script to plot points, do a straight-line fit, and display the 

# points, fit, fit parameters, error bars, chi-squared per degree of freedom, 

# and goodness of fit parameter on the plot. 
# 

# Usage: "gnuplot this_f ile_name" 
# 

# The data is assumed to be a file "data.HW3", each line containing 

# information for one point (x_i, y_i, sigma_i) . The script produces a 

# postscript file, called here "HW3b.eps". 
# 

set size 1.0, 0.6 

set terminal postscript portrait enhanced font 'Helvetica, 16 ' 
set output "HW3b.eps" 

set fit errorvariables # needed to be able to print error bars 

f (x) = a + b * X # the fitting function 

fit f(x) "data.HWS" using 1:2:3 via a, b # do the fit 

set xlabel "x" 

set ylabel "y" 

ndf = FIT_NDF # Number of degrees of freedom 

chisq = FIT_STDFIT**2 * ndf # chi-squared 
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Q = 1 - igaiiima(0.5 * ndf, 0.5 * chisq) # the quality of fit parameter Q 
# 

# Below note how the error bars are (a) corrected by dividing by 

# FIT_STDFIT, and (b) are displayed on the plot, in addition to the fit 

# parameters, neatly formatted using sprintf . 
# 

set label sprintf ("a = 7,7. 4f +/- 7.7. 4f", a, a_err/FIT_STDFIT) at 100, 400 

set label sprintf ("b = 77. 4f +/- 7,7. 4f", b, b_err/FIT_STDFIT) at 100, 330 

set label sprintf ("{/Symbol c}"2 = 7o6.2f", chisq) at 100, 270 

set label sprintf ("{/Symbol c}-2/NDF = 7.6. 4f", FIT_STDFIT**2) at 100, 200 

set label sprintf ("Q = 7.9. 2e", Q) at 100, 130 

plot \ # Plot the data and fit 

"data.HWS" using 1:2:3 every 5 with errorbars notitle pt 6 Ic rgb "red" Iw 2, \ 
f(x) notitle Ic rgb "blue" Iw 4 It 1 

The plot below shows the result of acting with this gnuplot script on a the data in 
http://physics.ucsc.edu/~peter/bad-honnef/data.HW3. The results agree with those of the 
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other scripts. ^ 

3. Scripts for a fit to a non-linear model 

We read in lines of data each of which contains three entries Xi,yi and cjj. These are fitted to 
the form 

y = T, + Ajx^ , (Gl) 




to determine the best values of To ^ and lo. 
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a. Python 

# 

# Python program written by Matt Wittmann 
# 

# Usage: "python this_f ile_naiiie data_file" 
# 

# Does a fit to the non-linear model 
# 

# y = Tc + A / x**w 
# 

# to the data in "data_file", each line of which contains the data for one point, 

# x_i, y_i, sigma_i. 
# 

# Uses the built-in routine "curve_fit" in the scipy package. Note that this 

# requires the error bars to be corrected, as with gnuplot 
# 

from pylab import * 

from scipy . optimize import curve_fit 
from scipy. stats import chi2 

fname = sys.argv[l] if len(sys . argv) > 1 else 'data.txt' 

X, y, err = np . loadtxt (fname , unpack=True) # read in the data 

n = len(x) # the number of data points 



pO = [-0.25, 0.2, 2.8] 

f = lambda x, Tc, w. A: Tc + A/x**w 

p, covm = curve_fit(f, x, y, pO, err) 
Tc, w, A = p 

chisq = sum(((f(x, Tc, w. A) - y)/err)**2) 

ndf = n -len(p) 

Q = 1. - chi2.cdf (chisq, ndf) 

chisq = chisq / ndf 

Tcerr, werr, Aerr = sqrt(diag(covm) /chisq) 

print 'Tc = 7,10. 4f +/- 7,7. 4f' % (Tc, Tcerr) 

print 'A = 7,10. 4f +/- 7,7. 4f' % (A, Aerr) 

print 'w = 7,10. 4f +/- 7,7. 4f' % (w, werr) 

print 'chi squared / NDF = 7,7. 41f' 7, chisq 

print 'Q = 7,10. 4f' 7, Q 



# initial values of parameters 

# define the function to be fitted 

# note python's lambda notation 

# do the fit 

# compute the chi-squared 

# no. of degrees of freedom 

# compute the quality of fit parameter Q 

# compute chi-squared per DOF 

# correct the error bars 



When applied to the data in http://physics.ucsc.edu/~peter/bad-honnef/data.HW4 the out- 
put is 

Tc = -0.2570 +/- 1.4775 

A = 2.7878 +/- 0.8250 

w = 0.2060 +/- 0.3508 

chi squared / NDF = 0.2541 
Q = 0.9073 
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b. Gnuplot 

# 

# Gnuplot script to plot points, do a fit to a non-linear model 
# 

# y = Tc + A / x**w 
# 

# with respect to Tc, A and w, and display the points, fit, fit parameters, 

# error bars, chi-squared per degree of freedom, and goodness of fit parameter 

# on the plot . 
# 

# Here the data is assumed to be a file "data.HW4", each line containing 

# information for one point (x_i, y_i, sigma_i) . The script produces a 

# postscript file, called here "HW4a.eps". 
# 

set size 1.0, 0.6 

set terminal postscript portrait enhanced 
set output "HW4a.eps" 

set fit errorvariables # needed to be able to print error bars 

f(x) = Tc + A / x**w # the fitting function 

set xlabel " l/x~{/Symbol w}" 
set ylabel "y" 

set label "y = T_c + A / x"{/Symbol w}" at 0.1, 0.7 

Tc = 0.3 # need to specify initial values 

A = 1 

w = 0.2 

fit f(x) "data.HW4" using 1:2:3 via Tc, A, w # do the fit 
set xrange [0.07:0.38] 
g(x) = Tc + A * X 
h(x) = + * X 

ndf = FIT_NDF # Number of degrees of freedom 

chisq = FIT_STDFIT**2 * ndf # chi-squared 

Q = 1 - igamma(0.5 * ndf, 0.5 * chisq) # the quality of fit parameter Q 
# 

# Below note how the error bars are (a) corrected by dividing by 

# FIT_STDFIT, and (b) are displayed on the plot, in addition to the fit 

# parameters, neatly formatted using sprintf . 
# 

set label sprintf ("T_c = 7.5. 3f +/- yo5.3f",Tc, Tc_err/FIT_STDFIT) at 0.25, 0.33 

set label sprintf ("{/Symbol w} = 7,5. 3f +/- 7o5.3f",w, w_err/FIT_STDFIT) at 0.25, 0.27 

set label sprintf ("A = 7o5.2f +/- 7.5. 2f", A, A_err/FIT_STDFIT) at 0.25, 0.21 

set label sprintf ("{/Symbol c}"2 = 7.5. 2f", chisq) at 0.25, 0.15 

set label sprintf ("{/Symbol c}-2/NDF = 7.5. 2f", FIT_STDFIT**2) at 0.25, 0.09 

set label sprintf ("Q = 7o5.2f", Q) at 0.25, 0.03 

# 

# Plot the data and the fit 
# 

plot "data.HW4" using (l/$l**w):2:3 with errorbars notitle Ic rgb "red" Iw 3 pt 8 ps 1.5, \ 
g(x) notitle Ic rgb "blue" Iw 3 It 2 , \ 
h(x) notitle It 3 Iw 4 
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The plot below shows the result of acting with this gnuplot script on the data at 
http://physics.ucsc.edu/~peter/bad-honnef/data.HW4. The results agree with those of the 



python script above. 
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